Asynchronous and Anticipatory Filter-Stream Based Parallel Algorithm for Frequent Itemset Mining
نویسندگان
چکیده
In this paper we propose a novel parallel algorithm for frequent itemset mining. The algorithm is based on the filter-stream programming model, in which the frequent itemset mining process is represented as a data flow controlled by a series producer and consumer components (filters), and the data flow (communication) between such filters is made via streams. When production rate matches consuption rate, and communication overhead between producer and consumer filters is minimized, a high degree of asynchrony is achieved. Our algorithm is built on this strategy − it employs an asynchronous candidate generation, and minimizes communication between filters by transfering only the necessary aggregated information. Another nice feature of our algorithm is a look forward approach which accelerates frequent itemset determination. Extensive experimental evaluation comproves the parallel performance and scalability of our algorithm.
منابع مشابه
An Accelerator for Frequent Itemset Mining from Data Streams with Parallel Item Tree
Frequent itemset mining attempts to find frequent subsets in a transaction database. In this era of big data, demand for frequent itemset mining is increasing. Therefore, the combination of fast implementation and low memory consumption, especially for stream data, is needed. In response to this, we optimize an online algorithm, called Skip LC-SS algorithm [1], for hardware. In this paper, we p...
متن کاملFrequent Itemset Mining over Stream Data: Overview
During the past decade, stream data mining has been attracting widespread attentions of the experts and the researchers all over the world and a large number of interesting research results have been achieved. Among them, frequent itemset mining is one of main research branches of stream data mining with a fundamental and significant position. In order to further advance and develop the researc...
متن کاملA New Algorithm for High Average-utility Itemset Mining
High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...
متن کاملA Review on Algorithms for Mining Frequent Itemset Over Data Stream
Frequent itemset mining over dynamic data is an important problem in the context of data mining. The two main factors of data stream mining algorithm are memory usage and runtime, since they are limited resources. Mining frequent pattern in data streams, like traditional database and many other types of databases, has been studied popularly in data mining research. Many applications like stock ...
متن کاملAccelerating Parallel Frequent Itemset Mining on Graphics Processors with Sorting
Frequent Itemset Mining (FIM) is one of the most investigated fields of data mining. The goal of Frequent Itemset Mining (FIM) is to find the most frequently-occurring subsets from the transactions within a database. Many methods have been proposed to solve this problem, and the Apriori algorithm is one of the best known methods for frequent Itemset mining (FIM) in a transactional database. In ...
متن کامل